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Abstract 

In this note we prove certain necessary and sufficient conditions for 
the existence of an embedding of statistical manifolds. In particular, we 
prove that any smooth (C 1 resp.) statistical manifold can be embedded 
into the space of probability measures on a finite set. As a result, we get 
positive answers to the Lauritzen question on a realization of smooth (C 1 
resp.) statistical manifolds as statistical models. 
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1 Introduction 

A statistical model is a family M of probability measures on a measurable 
space fl. There are two natural geometrical structures on any statistical model 
equipped with a diffcrcntiablc manifold structure. They are the Fisher tensor 
and the Chentsov-Amari tensor. 

The Fisher tensor was given by Fisher in 1925 as an information character- 
ization of a statistical model. Rao [Rao(1945)] proposed to consider this ten- 
sor as a Riemannian metric on the manifold of probability distributions. This 
Fisher metric has been systematically studied in [Chcntsovl972], [M-C 1990], 
[A-N2000] and others [Lauritzenl987], [Raol987], [Ay2002], [Jost2005], ect. in 
the field of geometric aspects of statistics and information theory. 

Chentsov [Chentsovl972] and Amari [Amaril997] independently also discov- 
ered a natural structure on statistical models, namely a 1-parameter family of 
invariant connections, which includes the Levi-Civita connection of the Fisher 
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metric. This family of invariant connections is defined by a 3-symmetric tensor 
T together with the Levi-Civita connection of the Fisher metric. 

Motivated by the question how much we can describe a statistical model via 
their Fisher metric and Chentsov-Amari tensor T, in 1987 Lauritzen proposed 
to call a Riemannian manifold (M, g) with a 3-symmetric tensor T a statisti- 
cal manifold. Since two 3-symmetric tensors T and k ■ T, k ^ 0, define the 
same family of Chentsov-Amari connections, we shall say that two statistical 
manifolds (M,g,T) and (M 7 g,kT) are conformal equivalent. 

A natural and important question in the mathematical statistics is to un- 
derstand, if a given family M of probability distributions can be considered as 
a subfamily of another given one N. In the language of statistical manifolds, 
this question can be formulated as a problem of isostatistical embedding of a 
statistical manifold (M,g,T) into another one (N,g',T'). Here we say that an 
immersion / : (M,g,T) — ► (M,g,T) is called isostatistical, if f*(g) = g and 
f*(f) = T. 

We shall see in section 2 that the problem of the existence of an isostatis- 
tical embedding includes also the Lauritzen question in 1987, if any statistical 
manifold is a statistical model. It also concerns the following important prob- 
lem posed by Amari in 1997, if any finite dimensional statistical model can be 
embedded into the space C ap N of probability distributions of the sample space 
n N of N elementary events for some finite N. 

We shall construct a class of C° (and C 1 ) monotone invariants of statisti- 
cal manifolds, which present obstructions to embedding of a given C k statistical 
manifold M into another one N n . Here a C k statistical manifold (M,g,T) is a 
smooth differentiable manifold with C k sections g e S 2 T*M and T e S 3 T*M. 
These invariants measure certain relations between the metric tensor g and the 
3-symmetric tensor T. In particular, using these invariants we show that no 
statistical manifold which is conformal equivalent to the space Cap N can be 
embedded into the product of m copies of the normal Gaussian manifolds for 
any N > 3 and any finite to. In the Main Theorem (section 5) we prove that any 
smooth (C 1 resp.) statistical manifold M m can be isostatistically embedded to 
a the space Cap N for some N big enough. 

As a consequence we also get a new proof of Matumoto theorem on the 
existence of the contrast function for a statistical manifold (see 2.8). 

Acknowledgement. I am thankful to Jiirgen Jost and Nihat Ay for their 
introduction to the field of information geometry and helpful discussions. 

2 Statistical models and statistical manifolds. 

In this section we recall the definitions of the Fisher metric and the Chentsov- 
Amari connections on statistical models. We introduce the notion of a weak 
Fisher metric and a weak potential function. At the end of the section we 
discuss the problem, if a given statistical manifold is a statistical model. Most 
of the facts in this section can be found in [A-N2000] . 
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Suppose that M is a statistical model - a family of probability measures on 
a space f2. We assume throughout this note that M and f2 are differentiable 
manifolds, and ft is equipped with a fixed Borcl measure duj. We also write 

(2.1) p(x,u>) = p(x,u))du>, 

where p(x,w) in LHS of (2.1) is a Borel measure in M and p(x,u>) in the RHS 
of (2.1) is a non negative (density) function on M x f2 which satisfies 

(2.1.a) / p(x, lj) duo = 1 Vx <= M. 

The Fisher metric g F (x) is defined on M as follows. For any V,W £ T X M we 
put 

(2.2) g F (V,W) x = f {dv\np(x,w)){dw\np{x,u))p(x,w). 

The function under integral in (2.2) is well defined, if 
(2.1.6) p(x,u)>0, 

Denote by Cap(fl) the space of all probability measures on ft. Clearly we 
can consider the density function p(x,u)) as a mapping M — > Cap(£l). Thus 
we shall call a function p(x,u>) a probability potential of the metric gp, 
if p(x,co) satisfies (2.1. a), (2.1.b), (2.2). We shall see in Proposition 2.2 that 
for a given Riemannian metric on a smooth manifold M there exist many 
probability potentials f{x,ui) for gp, even if we fix the space (f2, du>). 

Some time it is useful to consider functions p(x,u>) which satisfy (2.2) and 
(2.1.b) but not necessary (2.1. a). In this case, the Riemannian metric g F will 
be called weak Fisher metric, and the function p(x, ui) will be called a weak 
probability potential of g F . 

2.3. Example of a weak Fisher metric: the standard Euclidean 
metric g° on the positive quadrant K^(xi > 0). It is straightforward to 
check that go admits a weak probability potential {pi(x) — \x\,i = 1, TV.} Here 
f2 = £l N - the sample space of N elementary events. 

2.4. The Fisher metric on the space (Cap N )+ of all positive proba- 
bility distributions on n N (see also [A-N2000], [Jost2005], [Chentsovl972]). By 
definition we have 

Capl := {{p u ■ ■ ■ ,p N )\ Pi > for i = MV&^p, = 1}. 

We define the embedding map 

}:Cap N + ^S N -\2), 

(p-L,- ■ ■ ,p N ) i-» (qi = 2 VPi, ■■■ ,qw = 2^/Pjv)- 
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It is easy to see that the Fisher metric in the new coordinates (qi) is the standard 
metric of constant positive curvature on the sphere S N ~ 1 (2). 

2.5. Divergence potential (see [A-N2000], [Rao(1987)].) A function p on 
M x M with the following property 

(2.5.1) p(x, y) > with equality iff x = y 

is called a divergence function. A divergence function p is called a divergence 
potential for a metric g on M, if 

(2.5.2) g(X, Y) x = Hess{p){h{X), h{Y)). 
where 

T (X>X) (M,M) = (T X M, 0) 8 (0,T x M) = (n(T x M)) (i 2 (T x M)). 

An example of a divergence potential for a Fisher metric is the Jensen function 
Jft^{x,y) of the entropy function H{x) on M, or a Kullback relative entropy 
function K(x, y) on M x M. 

2.6. Chentsov-Amari connections. Let p(x,ui) be a probability poten- 
tial for a Riemannian metric g. We define a symmetric 3-tensor T on M as 
follows 

(2.6.1) T{X,Y,Z) = J(dxlnp(x,Lu))(d Y \np(x,Lu))(d z \np(x,Lu))p(x,u;). 

We denote by V F the Levi-Civita connection of the (weak) Fisher metric g F . 
We define 

(2.6.2) < V^F, Z >:=< V£Y, Z > +t ■ T(X, Y, Z). 

The connections V* are called the Chentsov-Amari connections. 

2.6.3. Remark. ([A-N2000], [Matsumotol993]) Any divergence function 
p(x, y) on M x M defines a tensor T on M via the following formula 

T(X,Y,Z) x = -d i2{z) Hess(p)(i 1 (X)J 1 (Y)) {x , x) +d il(z) H 

If g and T are defined by the same divergence function p(x,y), we shall call 
p(x,y) a divergence potential for the statistical manifold (M,g,T). It 
is a known fact that the Kullback relative entropy function is a divergence 
potential for the associated statistical model. 

2.7. Statistical submanifolds. 

A submanifold A in a statistical manifold (M, g, T) with the induced Rie- 
mannian metric gm and induced tensor Xjjv is called statistical submanifold 
of (M,g,T). Clearly, if f(x,u)) is a (weak) probability potential for (M,g,T), 
then its restriction to any submanifold N C M is a (weak) probability potential 
of the induced statistical structure. 
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2.8. Statistical models and statistical manifolds. Since any probabil- 
ity function p(x,u)) defines a map M — > Cap(Q), we shall say that a statistical 
manifold (M, g, T) is a statistical model, if there probability potential p(x, ui) 
for g and T. By the remark in 2.7, we get that a statistical submanifold of a sta- 
tistical model is also a statistical model. Furthermore, if a statistical manifold 
(M, g, T) is a statistical model, then it must admit a divergence potential. Hence 
the following theorem of Matsumoto is a consequence of our Main Theorem in 
section 5. 

2.8.1 Theorem. ([Matsumotol993] ) For any statistical manifold (M, g, T) 
there exists a divergence potential p for g and for T. 

3 Embeddings of linear statistical spaces. 

An Euclidean space (ffi™,<7°) equipped with a 3 -symmetric tensor T will be 
called a linear statistical spaces. We observe that the equivalence class of 
linear statistical spaces coincides with the orbit space of 3-symmetric tensors 
T under the action of the orthogonal group 0(n). In this section we discuss 
certain invariants of these orbits and we show several necessary and sufficient 
conditions for the existence of embedding of one linear statistical space into 
another linear statistical space by studying these invariants. A class of our 
necessary conditions consists of monotone invariants A, i.e. we assign to any 
linear statistical space (K n , g°, T) a number X(M. n ,g°, T) such that, if (K™, g°, T) 
is a statistical submanifold of (R m , g°, T'), then we have 

\{R n ,g°,T) < A(R m ,.g°,T'). 

Since a tangent space of a statistical manifold is a linear statistical manifold, 
these invariants play important role in the problem of isostatistical immersion. 

3.1. Trace type of a symmetric 3-tensor. Let us denote by 1Z n the 
subspace in 5 3 (K n ) consisting of the following 3-symmetric tensors 

T v (x, y, z) —< v, x >< y, z > + < v, y >< x, z > + < v, z >< x, y >, 

where v € K™. Using the standard representation theory (see e.g. [0-N1988]) 
we have the decomposition 

(3.2) 5 3 (M") =n(3ir 1 )@n n . 

To compute the orthogonal projection of a 3-symmetric tensor T on the 
space lZ n in the decomposition (3.2.) we can use the following Lemma. We 
denote by 7r 2 the orthogonal projection form 5' 3 (R n ) to lZ n . 

3.3. Lemma. We have 
(3.4) ^s) = -±-T^ s \ 
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Here we identify the 1-form Tr(S) with a vector in M. n by using the Euclidean 
metric g . 

We omit the proof of Lemma 3.3 which is straightforward. In view of Lemma 
3.3 we shall call any tensor T e lZ n of trace type. 

We note that 



Thus the dimension of the quotient S 3 (K™)/ SO{n) is at least C 3 + C\ + n. A 
direct computation shows that the dimension of the orbit SO(n)([%2™_ 1 divf]) 
is C% = dimS'O(n), if Uai ^ 0. Here {vi} is an orthonormal basis in M™. 
Hence the dimension of S 3 (M. n )/0(n) = C 3 + C^+n. This dimension is exactly 
the number of all complete invariants of pairs consisting of a positive definite 
bilinear form g and a 3-symmetric tensor T. 

Since the dimension of Gfc(K n ) = k(n — k), it follows that generically it is 
impossible to embed a linear statistical space (R k , g°, T) into a given statistical 
linear space (R n ,g°, T), unless k(n — k) > C| + C\ + k. Clearly the dimension 
condition is not sufficient as the following proposition shows. 

3.5. Proposition. A linear statistical space (R k ,g°,T) can be embedded 
into a linear statistical space (M. N ,g°,T v ), if and only if N > k and T is also a 
trace type: T = T w with \w\ < \v\. 

Proof The necessary condition follows from the fact that the restriction of 
T v to R k equals T v , where v is the orthogonal projection of v to R k . Conversely, 
if \w\ < \v\ we can find an orthogonal transformation, such that w equals the 
orthogonal projection of w on R 1 . □ 

3.6. Commasses as monotone invariants. Since the metric g extends 
canonically on the space 5 3 (M n ), we can define the absolute norm 



dimS 3 (R") = Cl + 2Cl + n 



n{n+ l)(n + 2) 
6 



||T|| := v/<T,T>. 



Now we define comasses of a 3-symmctric tensor T as follows 



M 3 





|x|=i,M=i 
M X {T) := maxT(x,x,x). 



Clearly we have 



< M\T) < M 2 (T) < M 3 



CO<||T||. 
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3.7. Proposition. The comasses are positive functions which vanish at T , 
if and and only ifT equals zero. They are monotone invariants ofT, since ifT 
is a restriction of 3-symmetric tensor T on R N , then 

(3.7.1). ||T|| < ||f ||, M\T) < Af (T), = 1,2,3. 

Proof. To prove the first statement it suffices to show that M. 1 vanishes at 
only T = 0. To see this we use the identity 

-12T(.x, y, z) = T(x + y + z, x + y + z, x + y + z)+T(x + y-z, x + y-z,x + y-z)+ 

+T(x - y + z,x-y + z,x-y + z) + T(-x + y + z, -x + y + z, -z + y + z) — 
2{T(x, x, x) + -T(y, y, y) + T(z, z, z)). 
The second statement follows immediately from the definition. □ 

Now for a space (K n , g°, T) and for 1 < k < n we put 
Afc(T) := min M\T ]Rk ). 

We can easily check that if T is a restriction of T to a subspace W n C M™, 
then 

Afe(T) > Afe(T) > for all k < m. 

Thus Afe(T) is a monotone invariant of linear statistical manifolds. These 
invariants are related by the following inequalities 

M\T) = A„(T) > A„_i(T) ■ • • > A 2 (T) > \\{T) = 0. 

The last equality follows from the fact, that the function T(x, x, x) is anti- 
symmetric on S" l_1 (|x| = 1) C W 1 and 5 rl_1 is connected. We observe that if 
T is of trace type, then A n _i(T) = • • • = Ai(T) = 0. 

We are going to give a lower bound of the monotone invariant A„_i of a 
linear statistical space of certain type. The equality A„_i(R™, g°,T) > A means 
that no hyperplane with the norm M. 1 strictly less than A can be embedded in 
(R n_1 , <7°,T). 

3.8. Lemma, a) Let T = Y^i=i(^ — £i)(x 1 ) 3 be a 3-symmetric tensor on 
M™ with n > 4, N > 4 and |e,| < 1/4. Then we have 

^ l( T)>-^-l/4. 

b ) Let T = i^XiiLil 1 ') 3 ' an d H be a hyperplane in R" which is orthogonal to 
(kn, 1, 1, • • • , 1), and let n > 5, k > 3. Then we have 

N 

X n -2(T\ H ) > J - 1. 
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c) Let x = ((1 - e), ^, • • • , ^) e C R™ +1 , w/iere n > 4, fc > n. We 

denote by H the tangential plane T x S n , and by T° the following 3- symmetric 



tensor on R™ +1 : 



(3.8.1) T? jk (x u ---x N ) = 6 ijk 1 



X„-i(2f H )>^-l. 



3.8.2. Remark. The tensor T° in (3.8.1) defines on {W\g°) a statistical 
structure with a weak probability potential {\xj,i = l,n}. 

Proof of Lemma 3.8. The reader shall see that a proof of Lemma 3.8 can be 
done in the same scheme of the proof of Sublcmma 5.10. Therefore we do not 
repeat this argument here. 

3.8.3. Remark. Lemma 3. 8. a holds also for n = 3 but not for n = 2, 
Lemma 3.8.b holds also for n = 4, but not for n = 3, and Lemma 3.8.C holds 
also for n — 3 but not for n = 2. 

There are also several obvious monotone invariants of T . 

i4 1 (T) := max T{x,y,z) 

|x|=|l/| = |z| = l,<x,i/> = <i/,z>=<z,x>=0 

is well-defined for n > 3. 

A\T) := max T(x,y,y), 

|x| = |l/|=l,<x,l/>=0 

is well-defined for n > 2. We can check that 

keryl 1 =TZ n . 

On the other hand we have 

kerA 2 C ft(37n). 
Thus A 1 and A 2 are different invariants. 

3.9. Lemma. Let m be the first component of T in decomposition (3.2). 
Then \ \T\\\ := ||7ri(T)|| is a monotone invariant of T . 

Proof. Let R fe be a subspace of R™. We denote by ir^T the restriction of T 
to R fe . Clearly 

7rJ?(T)=7rJ?(7riT) + 7rJ?(7r 2 T). 

We have noticed in Proposition 3.5 that the restriction of the trace form ir 2 T to 
any subspace is also a trace form. Thus 7r^(7r 2 ) is an element in lZ k C S 3 (R fe ). 
Hence we have 

(3.9.1) 7ri(7rJT) = 7ri(7rJ(7riT)). 
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Since all the projections m, tt% decrease the norm ||.||, we get 

MT\\i = = \\*iW{*iT))\\ < lki(T)|| = HTHl 

□ 

3.10. Proposition. A statistical line (R, g°,T) can be embedded into 
(R N ,g°,T'), if and only if M 1 (T) < M^V). 

Proof. It suffices to show that we can embed (R,g°,T) into (R N , g°, T"), if 
we have A1 1 (T) < A1 1 (T'). We note that T'(v,v,v) defines an anti-symmetric 
function on the sphere S ,JV_1 (|w| = 1) C 1 N . Thus there is a point v e S N ~ X 
such that T'(v,v,v) = A1 1 (T). Clearly the line v £g> K. defines the required 
embedding. □ 

Let us consider the embedding problem for 2-dimcnsional linear statistical 
spaces. It is easy to see that 

S 3 (R 2 ) =R 2 ®M 2 . 

Thus the quotient S 3 (R 2 )/SO(2) equals (M 2 ©M 2 )/^ 1 . Geometrically there are 
several ways to see this. In the first way we denote components of T e 5 3 (M 2 ) 
via Tin, Tn2, T122, T 2 22- 

3.11. Lemma. There exists an oriented orthonormal basic in R 2 such 
that Tm = A / t 1 (T) > 0,Tn2 = for all non-vanishing T. These numbers 
(Tin, T122, T222) are called canonical coordinates of T . Two tensors T and T' 
are equivalent, if and only if they have the same canonical coordinates. 

Proof. We choose an oriented orthonormal basis («i,w 2 ) by taking as vi 
a point on 5' 1 (|a;| = 1), where the function T(x,x,x) reaches the maximum. 
The first variation formula shows that in this case Tn 2 = 0. This shows the 
existence of the canonical coordinates. Clearly, if two tensors have the same 
canonical coordinates, then they are equivalent. Next, if two tensors T and T' 
are equivalent, then their norms M 1 are the same. We need to take care the 
case, when there are several points x at which T(x, x, x) reaches the maximum. 
In any case, they have the same first coordinates. Next we note that 

< Tr(T),Tr(T) >= (T m +Ti 22 ) 2 + 

||T|| 2 =T 1 2 11 + 7? 22 +2f 22 . 

Thus if two tensors are equivalent and have the same first coordinates, they 
must have the same third coordinate T i2 2 , and this third coordinate is uniquely 
defined up to sign. The condition on the orientation tells us that the sign must 
be +. This proves the second statement. □ 



3.12. Proposition. We can always embed the 2-dimensional statistical 
space (R 2 ,g°,0) into any linear statistical space (R n ,g°,T), if n > 7. 
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Proof. It suffices to prove for n = 7. We denote by 0(T) the set of of all unit 
vectors »eS 6 such that T(v,v,v) — 0. Clearly 0{T) is a set of dimension 5 
in S 6 . Since T is anti-symmetric, there exists a connected component O (T) of 
0(T) which is invariant under the anti-symmetry involution. Now we consider 
the following function / on O (T). For each v € O (T) we denote by A v the 
bilinear symmetric 2-form on the space T x O°(T) considered as a subspace in 



Then wc define f(v) equal to det(A v ). Since 0(T) has dimension 5, the function 
f(v) is anti-symmetric on O (T). Hence the set Og(T) of all t; G with 
/(w) = has dimension 4 and it contains a connected component which is also 
invariant under the anti-symmetric involution. For the simplicity we denote 
this connected component also by Oq(T). Now we consider the following two 
possible cases. 

Case 1 . We assume that there is a point v G Oq{T) such that the nullity of 
A v is at least 2. Then there are two linear independent vectors y,z G T v such 
that the restriction of A v on the plane M. 2 (y, z) vanishes. Since the set O (T) is 
connected and anti-symmetric and of co dimension 1 in S n , the plane R(y, z) 
has a non-empty intersection with O (T) at a point w. Then the restriction of 
T on the plane R 2 (v, w) is vanished, because 



Case 2 . We assume that the nullity of A v on Oq(T) is constantly 1. Using 
the anti-symmetric property of A v we conclude that the restriction of A v to 
the plane R 4 (u) which is orthogonal to the kernel of A" has index constantly 
2. Thus there exists a vector z which is orthogonal to the kernel y of A v such 
that A v (z, z) — 0. Clearly the restriction of A v to the plane R 2 (y, z) vanishes. 
Now we can repeat the argument in the case 1 to get a vector w such that the 
restriction of T to K 2 (w, w) vanishes. □ 

3.13. Theorem, a) Any statistical space (M. n ,g°,T) can be embedded in the 
statistical space (M. n ( n+1 \ <?°, T" = 2| \T\ \ ^2^}^ x\ ), where Xi are the canonical 
Euclidean coordinates on R"(™ +1 ). 



b) The trivial space (R", 5 °,0) can be embedded into (R 2n ,g°, Y^lii dxl f) f or 



Proof, a) We prove by induction. The statement for n — 1 follows from 
Proposition 3.8. Suppose that the statement is valid for all n < k. 

3.14. Lemma. Suppose that T G S' 3 (R fc+1 ). Then there are orthonormal 
coordinates X\,- ■ ■ ,Xk such that 



A v (y,z)=T(v,y,z). 



T(v,v,v) = T(w,w,w) = 
T(v, w, w) = (since A v (w, w) = 0), 
T(v, v,w)=0 (since w G T v O°{T)). 



all n. 



k+l 



(3.14.1) 
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Proof of Lemma 3.14- We choose v\ as the unit vector in S C R fe+1 , on 
which the function T(v,v,v) reaches the maximum on the unit sphere S . The 
first variation formula shows that T(vi,v\,w) = for all w which is orthogonal 
to vi . We denote by R fe the orthogonal complement to K • v\ . Now we consider 
a bilinear symmetric form A on R fc defined as follows 

A(x,y) = S(wi,a;,y). 

There is an orthonormal basis on R fc , where we can write A(x, y) = y^.J^ 1 aix 2 . 
Clearly in this orthonormal basis we can write T in the form in (3.14.1). □ 

Continuation of the proof of Theorem 3. 13. a We shall show explicitly that 
that any statistical space (R 2 , g°, T = a 2 xi(x 2 ) 2 ) can be embedded in (R 4 , g°, Eti(^) 3 )- 
if < |a 2 | < 1/2. We put 

(3-15.1) L(v 1 ):=±( 1 -, \-\~\) 

r „ 1( . 9 . T( , , / l + 2a 2 / l + 2a 2 / 1-2Q2 / 1-2Q2 , 

(3-15.2) L M:=( y__,-y__,y__,-y__). 

Here we take the sign + in (3.16.1), if a 2 > 0, and we take the sign — , if a 2 < 0. 
Clearly, L defines the required embedding R 2 — > M 4 . 

This together with Proposition 3.8 and the induction assumption complete 
the proof of Theorem 3.13. a. 

Proof of Theorem 3.13. b. We decompose the embedding / : (R",<?°,0) to 
(M 2rl , g°, X^"i( x *) 3 ) as follows 

f(x lr -- ,x n ) = (.f^xi),-.. ,f n (x n )) 

where f embeds the line (R, (dx l ) 2 ,0) into (R 2 , (dx 24 - 1 ) 2 + (dx 2t ) 2 , (dx 2 *- 1 ) 3 + 
(dx 2t ) 3 ). Clearly, / is the required embedding. □ 

4 Monotone invariants and obstructions to em- 
beddings of statistical manifolds 

Let K(M, e) denote the category of statistical manifolds M with morphisms 
being embeddings. Functors of this category are called monotone invariants 
of statistical manifolds. Clearly any monotone invariant is an invariant of sta- 
tistical manifolds. 

4.1. Examples. There are many monotone invariants which arise from our 
analysis in section 3. 
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a) Trace type of a statistical manifold. A statistical manifold (M, g, T) 
will be called of trace type, if for all x e M the form T{x) is of trace type 
(see 3.1.) It follows from Proposition 3.5 that any statistical submanifold of a 
statistical manifold of trace type is also of trace type. Thus the trace type is a 
monotone invariant. In particular we cannot embed the statistical space Cap N 
and the normal Gaussian space into any statistical space of trace type. On 
the other hand, unlike the linear case, we cannot embed a statistical manifold 
of trace type into another one of trace type, even if the norm condition is 
satisfied. For example, if the trace form is closed (or exact), then the trace 
form of its submanifolds is also closed (resp. exact). Hence within a class of 
statistical manifolds of trace type we get a new monotone invariants which can 
be expressed via the closedness and the cohomology class of the corresponding 
trace form. 

b) Decomposability of a statistical manifold. We note that the class of 
3-symmetric tensors of trace form is a subclass of all decomposable tensors 
T 3 which are a symmetric product of 1-forms and symmetric 2-forms. Any 
statistical submanifold of a statistical manifold with a decomposable tensor T 
has also the (induced) decomposable tensor. Thus the decomposability is also a 
monotone invariant. The Gaussian normal 2-dimensional manifold is an example 
of decomposable type but not of trace type. 

c) Rank and comass. We define for any statistical manifold (M,g,T) the 
following number 

rank(T) — sup rank(T(x)) 
||T|| = sup ||T(aO||. 

M^TJo = sup M 1 (T(xj). 

||T|| 1>0 = sup 

Clearly these four numbers are monotone invariants of statistical manifolds. 

We recall that the normal Gaussian statistical manifold is the two dimen- 
sional statistical model which is upper half of the plane R 2 ([i,<j) with the po- 
tential 

here x € R. 

4.2. Proposition. Any statistical manifold which is conformal equivalent 
to the space Cap N cannot be embedded into the direct product of m copies of the 
normal Gaussian statistical manifold 2. 3. 3. a for any N > 3 and finite m. 

Proof. It is easy to check that M 1 {Cap N ) = oo. Thus any statistical 
manifold which is conformal equivalent to Cap N has also the infinite invariant 
M 1 . On the other hand, we compute easily that the norm M 1 of the Gaussian 
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normal manifold, as well as the norm Ai 1 of a direct product of its finite copies, 
is finite. Namely the norm M 1 (p,, a) is V2 for all (p,, a). □ 

4.3. Diameters of statistical manifolds. For a positive number p > 
and a statistical manifold (M,g,T) we set 

d p (M,g,T) := sup{Z <= i?+Uoo | 3 an embedding of ([0, 1], dx 2 , p(dxf) to (M,g,T).} 

We shall call d p (M,g,T) the diameter with weight p of (M,g,T). Clearly 
d p are monotone invariants for all p. 

To estimate the diameter with weight p of a given statistical manifold (M, g, T) 
we can proceed as follows. For each point x e M we denote by D p (x) the set 
of all unit tangential vector v € T X M such that T(v,v,v) = p. We denote by 
D p (x) the connected components of D p (x). We say that a unite vector v in 
T X M is p-characteristic with weight c(x), if there exists i such that we have 

c(x) = min < v,w >> 0. 

We shall say that a point x € M is p-regular, if there is an open neighbor- 
hood U £ (x) C M such that D p (U £ ) = U £ x D p (x). It is easy to see that the set 
of all p-regular points is open and dense in M for any given p. 

4.4. Proposition. The diameter d p of (M m , g,T) is infinite, if m > 3 cmd 
i/iere exists a number e > smc/i i/iai one o/ the following 2 conditions holds: 

a) There exists a (p+e) -regular point x £ M such that the convex hull Cov(D p+e (x)) 
of one of connected components D p+e (x) contains the origin point € T x M m 
as it interior point. 

b) (M m , g,T) has a complete Riemannian submanifold (N,g) such that there 
exists a smooth section x t— > (D p+e (x) D TN) over N . 

Proof. The statement under the first condition a) is based on the fundamen- 
tal Lemma of the convex integration technique of Gromov. Namely Gromov 
proved that [2. 4.1. A, Gromov(1986)], if the convex hull of some path connected 
subset Ao C M 9 contains a small neighborhood of the origin, then there exists a 
map / : S 1 — > R 9 whose derivative sends S 1 into A n . 

4.5. Lemma. Under the condition in Proposition 4-4-1 there exists a small 
neighborhood U$(x) in M and an embedded oriented curve S 1 C U$(x) such that 
for all point s(t) G S 1 we have M 1 (T s(t )S 1 ) > p + (e/2). 

Proof of Lemma 4-5. We denote by Exp the exponential map T x M m — > M m 
and by DExp the differential of this exponential map restricted to S* m_1 x 
T x M m C T(T x M' m ). Here S" 1 ' 1 is the unit sphere in T x M m . The space T x M m 
is a linear statistical space, so we denote by A4 X the induced norm- function on 
S™- 1 x T x M m as follows: 

Ml(l)=T x (l,l,l). 
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Since DExp is a continuous function, whose restriction to S m 1 x is the 
identity, there exists a ball B(Q, S) with center in € T X M such that 

(4.5.1) M}{DExp{l)) — M X {1)) < e/4 

for all I e S™- 1 x 5(5) C T(T x M m ). We can assume that <5 is so small such 
that DExp is a homeomorphism on S m ~ 1 x B(0, S). 

Now we apply the above mentioned Gromov Lemma [2. 4.1. A, Grl986] to get 
a oriented curve S 1 ^) in the linear space T X M such that 

(4 5 2) {d/dt)S\t) 

for all i. Next we observe that for all a > the curve a ■ S 1 ^) has the same 
norm as S 1 ^), i.e. 

M 1 x (T l{a . sl) (t))=M 1 x (T l{sl) (t))=p + e. 

Thus we can assume that our curve S 1 ^), which satisfies (4.5.2), lies in the ball 
B(0,5). By our choice of S ( see (4.5.1)), we get from (4.5.2) 

(4.5.3) P + - A e < M\Exp{S\t))) <p+\e, 

for all t. This curve Exp(S 1 (t)) is an immersed curve. To get an embedded 
curve we perturb the immersed curve such that the condition of Lemma 4.5 is 
satisfied. This is possible, since m > 3. □ 

Now let us to continue the proof of Proposition 4. 4. a. We denote by S 1 (t) 
the embedded curve in Lemma 4.5. Next by choosing a tubular neighborhood 
of S 1 ^) we can get a (small, thin) oriented embedded solid torus T 3 (t,s,r) = 
S 1 ^) x S 1 (s) x [0, R] in M m such that our embedded curve is exactly the mean 
curve S' 1 (i) x {0} x {0} on the solid torus. We can choose this torus T 3 so thin, 
such that for all s, t, r we have 

(4.5.4) M\T?(t,s))>p+ £ -. 

Using (4.5.4) we choose a smooth unit vector field V(t, s) on the torus T 3 (t, s, r) 
which is tangential to each torus T^(t, s) such that T(V, V, V) = p. The integral 
curve of this vector field is either a circle or an curve of infinite length. If there 
exists an integral curve of infinite length, then this curve is our desired curve for 
the Proposition 4.5. Assume now that all the integral curves are circles. Then 
there exist an embedding S 1 (t) x [0, p] x [0, p] such that for all (s, r) e [0, p] x [0, p] 
the circle 5 1 (t) x {s} x {r} is an integral curve of V. Now we perturb V in a 
neighborhood [0, a] x [0, p] x [0, a] with a very small a such that the perturbed 
unit vector field V satisfies T(V, V, V) = p and the integral curve of vector 
field V is not any more periodic. This completes the proof of the first part in 
Proposition 4.4. 
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Using the same argument we can prove the second part b) of Proposition 
4.4. First we get the existence of an embedded curve S 1 ^) of arbitrary length 
on M such that A / I 1 (T| S i(t)) > p + (l/4)e. Now we consider a torus tubular 
neighborhood of this curve in M and apply the same argument in the first part, 
namely we get on each torus T 2 (t,s) an integral curve whose unit tangential 
vector V = (d / dt) S 1 (t; s , r) satisfies the condition: 

T(V,V,V)=p. 

If there exists an infinite integral curve, then we are done. If not, that means all 
integral curve are circles, then we apply the perturbation method in the proof 
of the first part and get our desired curve. □ 



5 Existence of isostatistical embeddings into 

Cap N . 

Main Theorem. Any smooth (C 1 resp.) statistical manifold (M n ,g,T) can be 
immersed into the statistical manifold (Cap+, g F , T A ~ C ) for some finite number 
N. Hence any statistical manifold is a statistical model. 

Wc first deduce our Main Theorem for compact statistical manifolds (M n ,g, T) 
from Theorem 5.1 and Theorem 5.5. 

5.1. Theorem. Let (M m ,g,T) be a compact smooth (C 1 resp.) statistical 
manifold. Then there exist numbers N G N + and A > as well as a smooth 
(C 1 resp.) embedding f : (M m ,g,T) -> (R N ,g Q ,A- T ) such that f*(g Q ) = g 
and f*(A-T Q ) = T. 

Our proof of Theorem 5.1 uses the Nash embedding theorem, the Gromov 
embedding theorem and an algebraic trick. The existence of monotone invari- 
ants prevents us extend Theorem 5.1 for non-compact case (in contrast to the 
Riemannian case.) 

5.2. The Nash embedding theorem. [Nashl954, Nashl956] Any smooth 
(C 1 resp.) -Riemannian manifold (M n ,g) can be isometrically embedded into 
(M. N ,go) for some N depending on M n . 

We denote by T the "standard" 3-tensor on M": 

n 
i=l 

5.3. The Gromov immersion theorem. [Gromovl986, 2.4.9.3' and 3.1.4] 

Suppose that M m is given with a smooth (C 1 resp.) symmetric 3-form T. Then 
there exists an embedding f : M m -> R N ^ m ) with N^m) = 3(n + + (™+ 2 )) 
such that f* (T ) =T. 
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Proof of Theorem 5.1. First we shall take an immersion f\ : (M m , g,T) — ► 
{R N ^ m \g ,T ) such that 

/i(3b) = T. 

The existence of /i follows from the Gromov immersion theorem. 
Then we choose a positive number A~ x such that 

g-A-\rM)= gi 

is a Riemannian metric on M, i.e. <?i is a positive symmetric bi-linear form. 
Such a number A exists, since M is compact. 

Now we shall choose an isometric immersion f 2 : (M m ,gi) — > (R N ,g ). The 
existence of /2 follows from the Nash isometric immersion theorem. 

5.4. Lemma. There is a linear isometric embedding L m+ \ : M. m+1 — > 
K 2m + 2 smc/i ffcaf L„+i(T ) = 0. 

Proof. We put 

Wl^l, • • • ,Zm+l) = {^(Xl), ■ ■ ■ ,f m+1 (x m +l)) 

where p embeds the line (R, (tto'^.O) into (R 2 , (dx 24 - 1 ) 2 + (dx 2i ) 2 , (dz 2 *- 1 )^ 
(dx 2 *) 3 ): 

= -j={x2i-i - X2i). 

Clearly, L m+1 is the required embedding. □ 
Completion of the proof of Theorem 5.1. Finally we take an embedding 

f 3 : M m — > Jj( m + 1 )(« l + 2 )+ m ]g>2m+2 

as follows. 

/ 3 (i) = il- 1 -/iWe(Vio/a)- 

Since f 2 is an embedding, /3 is the required embedding map for Theorem 
5.1. □ 

5.5. Theorem. Suppose that C is a compact subset in Cap 4 ^ 1 . Then any 
bounded domain D in a linear statistical manifold (W l , g n , A-T ) can be realized 
as an embedded statistical submanifold of Cap 4 ^ \ C. 

We denote by Cap 3 (A) the statistical manifold which is obtained by the 

restriction of the statistical structure from (R+, go, X)i=i( x i)~ 1 (^ x j) 3 ) to the 
positive quadrant S+(A) of the sphere of radius A with center at the origin of 
R 4 . 

Proof of Theorem 5.5. We put 
(5.6) A := max{4Vn,4V« • A}. 
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Let U be an open neighborhood of (A, (2A)" 1 , (2A)"\ (2A)- 1 ) e S 3 (2/^/n). 
Here A is the positive number such that 

(5.7) n\ 2 + 3n(2A)- 2 = A. 

We denote by U+ the intersection J7 n C ap 3 + (2 / *Jn) . We now choose U so small 
such that the product U x ntimes U C lies in the complement S* 4 " -1 \ C. 

Since U+ is a statistical submanifold of (R%, go,^2 i=1 x^ 1 (dxi) 3 ), the direct 
product U + x ntimes U + is a statistical submanifold of (^■ 4 j n ,go,J2 i 2i x ^ 1 (dx i ) 3 ). 
Hence U + x ntimes U + is a statistical submanifold of (Cap 4n , g F ,T A ~ C ). 

We denote by [/(A, r) the ball of radius r at the point (A, (2A)" 1 , (2A)- 1 , (2A)- 
5' 3 (2/v / n). First we prove 

5.8. Lemma. For given positive numbers R > and A > i/iere exists a 
positive numberr such that the bounded domain [0, JS] x„ times [0, i?] C (M. n ,go,A- 
T ) can be realized as an immersed statistical submanifold ofU + {A 1 r) X„ times 
U + (A,r)c(Cap\ n ,g F ,T A - c ). 

Proof of Lemma 5. 8. It suffices to show that there is a statistical immersion 
/ : ([0, R], dx 2 , A ■ dx 3 ) — > U + (A,r). On U + (A,r) we consider the distribution 
D(p) which is defined by 

D x {p) := e T x U+{A,r) : \v\ = l,T(v,v,v) = p} 

for any given p > 0. Clearly the existence of an immersion / : ([0, K], dx 2 , A ■ 
dx 3 ) — ► [7 + (^4, r) is equivalent to the existence of an integral curve with the 
length R of the distribution D(A). The existence of the desired curve is a 
consequence of the following Lemma 

5.9. Lemma. There exists an embedded torus T 2 in U + (A,r) which is 
provided with a unit vector field V on T 2 such that T(V, V, V) = A. 

Proof. Let us denote x a := (A, (2A)-\ (2A)-\ (2A)- 1 ) e S 3 {2/^/rl) with A 
satisfying (5.7). We shall need the following 

5.10. Sublemma. Let H be any 2-dimensional subspace in T Xo U + (A,r). 
Then there exists a unit vector w G H such that T(w,w,w) > 2A. 

Proof of Sublemma 5.10. The subspace H can be defined by two linear 
equations: 

(5.11) < w,x >= 0, 

(5.12) <w,h>=0. 

Here w is a vector in H C M 4 and h is a unit vector in R 4 which is not co-linear 
with x and which is orthogonal to H. Without loss of generality we can assume 
that 

h = (0 = hi,h2,ti3, hi) and h 2 = 1. 
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Case 1 . Suppose that not all the coordinates hi of h are of the same sign, 
so we assume that h\ = 0, h 2 < 0, h 3 > 0. We put 

, -hi h 3 

k 2 ■= — ?======, «3 : 



(5.13) w :— {w\,w 2 = (1 - £2)^3, W3 = (1 — £2)^2, = U14). 

The equation (5.12) for w is obviously satisfied. Now we choose w\, £2 from the 
following equations which are equivalent to (3.8) and the normalization of w. 

(5.14) A ■ toi + (1 — £2) ■ (2A)- 1 • (fc 2 + fcs) = 0, 

(5.15) u; 2 = (2e 2 - e|). 
From (5.14) we get 

(5.16) », - (I^l+M. 

Substituting this into (5.15), we get 

Ak 2 +h) 2 N , , 2(A: 2 + fc 3 ) 2 N ,k 2 + k 3 ^ 

^ <W + 1)e * - (2 + -fe?F )e2 + = °" 

We shall take one of (2 possible) solutions £ 2 of (5.17) which is 



(5 ' 18) £2 = 1 + ( T^T ) + 

From (5.18) we get 

. 5 
£9 < — , w 

since < k 2 + k 3 < 2 (this follows from (5.13)), and A > 2n~ 1/<2 (this follows 
from (5.7)), so A • 2A > 16. Since 

T X0 (w,w,w) = X^wf + {2A){w\ +w\) 

we get 

T(w, w, w) > 8^ • A(l - A) . J_ > 2 A 

So in this case 1 Sublcmma 5.10 holds. 

Case 2 . Now we shall assume that hi = and h 2 > /13 > /14 > 0. We set 

w := (wi,W2 = — a(l — £2), W3 = — a(l — £2), wa = a ■ a(l — £2)) where 
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(5.19) a := — > 2, 



(5.20) a > such that a 2 (a 2 + 2) = 1. 

The equation (5.19) ensures that < w,h >= 0. In order that w is a unite vector 
and w £ H the numbers wi , £2 must satisfy the following equations 

(5.21) -At* + (1 ' £2)( "' 2)Q = 0, 



(5.22) w\ = (2e 2 - e\). 

From (5.21) we get 

(l-e 2 )(a-2) a 

(5-23) Wl = ^ . 

Now substituting (5.23) into (5.22) we get a solution e 2 

e 2 = (1 + 2B 2 ) - v/(l + 2B 2 ) 2 -B 2 (l + B 2 ), 

where 

B= (a-2)a < 1 



\2A \2A 

Since < (a - 2) a < 1 we have e 2 < 1/(X2A) and ej < 1/(A2A). Hence 
T(w, w, w) = \~ 1 wl + (2A)a 3 (l - e 2 ) 3 (2 + a 3 ) > 2A. 



□ 



Completion of the proof of Lemma 5.9. First we choose a small embedded 
torus T 2 in U + (A, r) such that for all x € T 2 we have 

3 

(5.24) max {T(v, v, v)\ > -A. 

This is possible thank to Sublcmma 5.10. Since T(v,v,v) = —T(v,v,v) and 
T 2 = R 2 /Z 2 is parallelizable, (5.24) implies that we can find a smooth vector 
field V on T 2 sastifying the condition in Lemma 5.9. □ 

Completion of the proof of Theorem 5.5. For a given A in Theorem 5.5 we let 
A' := A+e for some small positive e and we apply Lemma 5.8 to (R, A') which is 
in fact to apply Lemma 5.9. We can show that the existence of an isostatistical 
immersion /' : ([0, R], dx 2 , A' ■ dx 3 ) — > U+(A',r) implies the existence of an 
isostatistical embedding / : ([0,R],dx 2 ,A ■ dx 3 )tolI + (A' ,r) by using the same 
argument in the proof of Proposition 4. 4. a. □ 
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Proof of Main Theorem for the non compact case. We can deal with this case 
by using the compact decomposition of M m as Nash did for the isometric em- 
bedding for the smooth case [10]. Namely we cover M m by disk neighborhoods 
N-, j = l,m + 1 in the following way. For each j let 

(5.25) C J ':=UjJV/. 

Then we require that the union in (5.25) is a disjoint union, i.e. Nj n N-j. = 0, if 
i 7^ k. We also require that each Nj overlaps only a finite number of other Nf. 
Now we "compactify" Nf via an surjective smooth mapping : <pi : Nf — > Sj, 
where Sj is a sphere of the same dimension m. The map 4>\ can be extended 
to the whole M m , since it maps the boundary of Nf into the north point of 
the sphere. On the other hand, this map (\>\ is injective in a large (enough) 
sub domain Nf C Nf . We can furthermore use the unity partition function to 
define a C 1 statistical structure on each Sj such that the (sum of) pull back 
via (f>j is the given statistical structure on M. In other words we can consider 
the C 1 statistical structure on M m as induced from (infinitely many) spheres 
Sj via the smooth mapping 4>\ . 

Now let for each j , 1 < j < m + 1, we put 

S 1 := UiSf. 

Using Theorem 5.1 we can find an isostatistical embedding 

^ : S* - (Sf (m '- 1 ( T L=),i*(s.,) ) i , (T s )) 
Vm + 1 

inductively, since each 5f is compact. 
Now let us consider the map 

Ij : M m - (^- 1 (-^=),i*( 50 ),i*(T s )), 
V m + 1 

as a composition of the map ^ : M — > Sj and the map ^ . 
Finally the product mapping 

I = h x • • • x 7 TO+ i 

is the desired isostatistical embedding, since the statistical structure on g^t" 1 )- 1 
is induced from the (nondinear) statistical structure on l JV ' m '. We can see this 
easily by noticing that both symmetric forms g$ and T$ are decomposable w.r.t. 
the embedding i : K" -> for any n,/ > 0, i.e. ff (R n ) = «»(.9o)(K" + 0> 

T 5 (i?^) = z*(T s )(M^). □ 
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